--- permalink: /spectre/simple-discovery/ --- Spectre - simple discovery workflow

<- Back to Spectre home page



Introduction


Overview

Here we provide a worked example of a ‘simple’ discovery analysis workflow, where the entire process (data prep, clustering, dimensionality reduction, cluster annotation, plotting, summary data, and statistical analysis) is contained within a single script. The ‘simple’ workflow is most suitable for fast analysis of small datasets. For larger or more complex datasets, or datasets with multiple batches, we recommend the general discovery workflow, where the data preparation, batch alignment, clustering/dimensionality reduction, and quantitative analysis are separated into separate scripts. The demo dataset used for this worked example are cells extracted from mock- or virally-infected mouse brains, measured by flow cytometry.

Strategy

The ‘simple’ and ‘general’ discovery workflows are designed to facilitate the analysis of large and complex cytometry datasets using the Spectre R package. We’ve tested up to 30 million cells in a single analysis session so far. The workflow is designed to get around the cell number limitations of tSNE/UMAP. The analysis starts with clustering with FlowSOM – which is fast and scales well to large datasets. The clustered data is then downsampled, and dimensionality reduction is performed with tSNE/UMAP. This allows for visualisation of the data, and the clusters present in the dataset. Once the possible cell types in the datasets have been explored, the clusters can be labelled with the appropriate cellular identities. Finally, we can use the clusters/populations to generate summary statistics (expression levels, frequencies, total counts etc), which allows us to create graphs and heatmaps, facilitating statistical analysis.

Multiple samples

To analyse multiple samples, all the files must be imported into the one analysis session. This allows cells from each session to be clustered and analysed together, and allows us to examine the differential expression of markers, or the differences in cell proportions, between experimental groups.

Batch alignment

The ‘simple’ discovery workflow does not include any batch alignment steps. If batch correction needs to be applied, we recommend using the general discovery workflow.




Citation

If you use Spectre in your work, please consider citing Ashhurst TM, Marsh-Wakefield F, Putri GH et al. (2020). bioRxiv. 2020.10.22.349563. To continue providing open-source tools such as Spectre, it helps us if we can demonstrate that our efforts are contributing to analysis efforts in the community. Please also consider citing the authors of the individual packages or tools (e.g. CytoNorm, FlowSOM, tSNE, UMAP, etc) that are critical elements of your analysis work. We have provided some generic text that you can use for your methods section with each protocol and on the ‘about’ page.

Sample methods blurb

Here is a sample methods blurb for this workflow. You may need to adapt this text to reflect any changes made in your analysis.

Computational analysis of data was performed using the Spectre R package (Ashhurst et al., 2020), with instructions and source code provided at https://github.com/ImmuneDynamics/spectre. Samples were initially prepared in FlowJo, and the population of interest was exported as raw value CSV files. Arcsinh transformation was performed on the data in R using a co-factor of 15 to redistribute the data on a linear scale and compress low end values near zero. The dataset was then merged into a single data.table, with keywords denoting the sample, group, and other factors added to each row (cell). The FlowSOM algorithm (Van Gassen et al., 2015) was then run on the merged dataset to cluster the data, where every cell is assigned to a specific cluster and metacluster. Subsequently, the data was downsampled and analysed by the dimensionality reduction algorithm Uniform Manifold Approximation and Projection (UMAP) (McInnes, Healy, Melville, 2018) for cellular visualisation.

Note

If you haven’t installed Spectre, please visit our Spectre installation page. If you aren’t familiar with using RStudio or Spectre, please see our RStudio and Spectre basic tutorials.



Before you start


Before you get started, we recommend trying out the R/RStudio and Spectre tutorials available on our getting st arted page, to familiarise yourself with R/RStudio first.



Setup


Software:

For instructions downloading R, RStudio, and Spectre, please see the getting started page.


Analysis script:

Please visit https://github.com/ImmuneDynamics/Spectre, and download the repository:


You can then find the ‘simple discovery workflow’ script here:


Create a folder for your experiment, and place the script in that folder.


Export the population of interest (POI)

Please see this page for detailed instructions on exporting data for Spectre. When using fluorescence data, please ensure you are exporting the data from compensated channels, indicated by ‘Comp-Channel Name’ (e.g. Comp-B515). Feel free to include any other relevant parameters as well (FSC, SSC, time etc). We recommend exporting CSV ‘scale’ value data, or alternatively exporting FCS files. These represent the untransformed data values. You are also welcome to use the CSV ‘channel’ value data, which uses a form of ‘binning’ to transform the data onto a linear distribution – please read this page for a more detailed explanation on CSV channel values.



Data folder

Create a folder within your experiment folder called ‘data’, and place the exported files there (see this page for more info).



Metadata folder

Create a folder within your experiment folder called ‘metadata’, and place a ‘sample.details.csv’ file there (see this page for more info).




1. Load packages and set directories


#######################################################################################################
#### 1. Load packages, and set working directory
#######################################################################################################

Running library(Spectre) will load the Spectre package (also known as a ‘library’). We can then use package.check() to see if the standard dependency packages are installed, and package.load() to load those packages.

    ### Load libraries

        library(Spectre)
        Spectre::package.check()    # Check that all required packages are installed
        Spectre::package.load()     # Load required packages

Here we can set our ‘primary’ directory, which is going to be the location where the R script is saved. This path will be stored as PrimaryDirectory.

Note: if you aren’t sure how to navigate directories in R, check out our brief introduction to R tutorial.

    ### Set PrimaryDirectory

        dirname(rstudioapi::getActiveDocumentContext()$path)
        setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
        getwd()
        PrimaryDirectory <- getwd()
        PrimaryDirectory

We can then set our input directory, which will be the ‘data’ folder where we placed our files during setup. To do this we ask R to start at the location of the PrimaryDirectory, go up one level .., and then find the data folder.

    ### Set 'input' directory

        setwd(PrimaryDirectory)
        setwd("../data/")
        InputDirectory <- getwd()
        setwd(PrimaryDirectory)

We need to set the location of the ‘metadata’ folder. This is where we can store a CSV file that contains any relevant metadata that we want to embed in our samples. In this example, it is located in a sub-folder called ‘metadata’.

    ### Set 'metadata' directory

        setwd(PrimaryDirectory)
        setwd("../metadata/")
        MetaDirectory <- getwd()
        setwd(PrimaryDirectory)

We need to create a folder where our output data can go once our analysis is finished. In this example we will call this ‘Output_Spectre’.

    ### Create output directory

        dir.create("Output_Spectre", showWarnings = FALSE)
        setwd("Output_Spectre")
        OutputDirectory <- getwd()
        setwd(PrimaryDirectory)



2. Import and prep data


#######################################################################################################
#### 2. Import and prep data
#######################################################################################################

Import data

To begin, we will change our working directory to ‘InputDirectory’ and list all the CSV files in that directory – these should be the sample CSV files. We can then read in all of our samples (in this example, one CSV file per sample) into a list called ‘data.list’. Spectre uses the data.table framework to store data, which reads, writes, and performs operations on data very quickly.

    ### Import data

        setwd(InputDirectory)
        list.files(InputDirectory, ".csv")
##  [1] "CNS_Mock_01.csv"   "CNS_Mock_02.csv"   "CNS_Mock_03.csv"
##  [4] "CNS_Mock_04.csv"   "CNS_Mock_05.csv"   "CNS_Mock_06.csv"
##  [7] "CNS_WNV_D7_01.csv" "CNS_WNV_D7_02.csv" "CNS_WNV_D7_03.csv"
## [10] "CNS_WNV_D7_04.csv" "CNS_WNV_D7_05.csv" "CNS_WNV_D7_06.csv"
        data.list <- Spectre::read.files(file.loc = InputDirectory,
                                         file.type = ".csv",
                                         do.embed.file.names = TRUE)

By default, the read.files() function will generate some other variables, which you can review, by running the do.list.summary() function.

The ‘name.table’ variable is a table of all the column names for all of your samples (one row per sample, one column per column name). If all of the column names are matching, then this table should be a repeating pattern. If it has been jumbled, then some of your samples have columns that don’t appear in other samples. The ‘ncol.check’ and ‘nrow.check’ are simple tables indicating the number or columns and rows in each sample.

    ### Check the data

        check <- do.list.summary(data.list)

        check$name.table # Review column names and their subsequent values
##      X1  X2   X3   X4    X5   X6   X7   X8  X9      X10    X11
## 1  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 2  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 3  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 4  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 5  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 6  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 7  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 8  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 9  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 10 NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 11 NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 12 NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
        check$ncol.check # Review number of columns (features, markers) in each sample
##       [,1]
##  [1,] 11
##  [2,] 11
##  [3,] 11
##  [4,] 11
##  [5,] 11
##  [6,] 11
##  [7,] 11
##  [8,] 11
##  [9,] 11
## [10,] 11
## [11,] 11
## [12,] 11
        check$nrow.check # Review number of rows (cells) in each sample
##       [,1]
##  [1,] 9937
##  [2,] 15415
##  [3,] 14246
##  [4,] 17044
##  [5,] 5459
##  [6,] 4891
##  [7,] 17950
##  [8,] 16233
##  [9,] 15999
## [10,] 17131
## [11,] 15926
## [12,] 18773

You can review the first 6 rows of the first sample in your data using the following:

        data.list[[1]]
##            NK11        CD3     CD45       Ly6G     CD11b      B220      CD8a
##    1:   42.3719  40.098700  6885.08  -344.7830 14787.300  -40.2399  83.71750
##    2:   42.9586 119.014000  1780.29  -429.6650  5665.730   86.6673  34.72190
##    3:   59.2366 206.238000 10248.30 -1603.8400 19894.300  427.8310 285.88000
##    4:  364.9480  -0.233878  3740.04  -815.9800  9509.430  182.4200 333.60500
##    5:  440.2470  40.035200  9191.38    40.5055  5745.820 -211.6940 149.22000
##   ---
## 9933:   11.2126  36.951600  2515.82  -647.4930  6172.070  221.9380 266.90000
## 9934:  239.9700 440.217000  7247.28 -1449.7200 15355.400  809.3040 456.59900
## 9935: -134.9650 111.350000  2472.85    81.5975  9657.160 -113.1320   3.79607
## 9936:   86.3333  28.286900  5745.27 -1284.0800 18303.100  353.5290 262.96300
## 9937:   10.1467 122.255000  1971.69  -215.7660   727.708  506.8580 113.14400
##            Ly6C      CD4    FileName FileNo
##    1:  958.7000  711.072 CNS_Mock_01      1
##    2:  448.2590  307.272 CNS_Mock_01      1
##    3: 1008.8300  707.094 CNS_Mock_01      1
##    4:  440.0710  249.784 CNS_Mock_01      1
##    5:   87.4815  867.570 CNS_Mock_01      1
##   ---
## 9933:  141.4200  708.348 CNS_Mock_01      1
## 9934: 2093.6900 2119.270 CNS_Mock_01      1
## 9935: -114.1510  110.743 CNS_Mock_01      1
## 9936:  745.8080  537.750 CNS_Mock_01      1
## 9937:  244.2210 2334.800 CNS_Mock_01      1

Merge data.tables

Once the metadata has been added, we can then merge the data into a single data.table using do.merge.files(). By default, columns with matching names will be aligned in the new table, and any columns that are present in some samples, but not others, will be added and filled with ‘NA’ for any samples that didn’t have that column initially. Once the data has been merged, we can review the data:

    ### Merge data

        cell.dat <- Spectre::do.merge.files(dat = data.list)
        cell.dat
##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo
##      1:   958.7000  711.0720   CNS_Mock_01      1
##      2:   448.2590  307.2720   CNS_Mock_01      1
##      3:  1008.8300  707.0940   CNS_Mock_01      1
##      4:   440.0710  249.7840   CNS_Mock_01      1
##      5:    87.4815  867.5700   CNS_Mock_01      1
##     ---
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12

Read in metadata

    ### Read in metadata

        setwd(MetaDirectory)

        meta.dat <- fread("sample.details.csv")
        meta.dat
##              Filename     Sample Group Batch Cells per sample
##  1:   CNS_Mock_01.csv 01_Mock_01  Mock     A           420000
##  2:   CNS_Mock_02.csv 02_Mock_02  Mock     B           240000
##  3:   CNS_Mock_03.csv 03_Mock_03  Mock     B           256000
##  4:   CNS_Mock_04.csv 04_Mock_04  Mock     A           252000
##  5:   CNS_Mock_05.csv 05_Mock_05  Mock     A           345000
##  6:   CNS_Mock_06.csv 06_Mock_06  Mock     B           702000
##  7: CNS_WNV_D7_01.csv  07_WNV_01   WNV     A          5070000
##  8: CNS_WNV_D7_02.csv  08_WNV_02   WNV     B          2940000
##  9: CNS_WNV_D7_03.csv  09_WNV_03   WNV     A          2120000
## 10: CNS_WNV_D7_04.csv  10_WNV_04   WNV     A          4320000
## 11: CNS_WNV_D7_05.csv  11_WNV_05   WNV     B          4080000
## 12: CNS_WNV_D7_06.csv  12_WNV_06   WNV     A          1830000



3. Data transformation


#######################################################################################################
#### 3. Data transformation
#######################################################################################################

Before we perform clustering etc, we need to meaningfully transform the data. For more information on why this is necessary, please see this page.

Note: If you have imported CSV (channel value) files exported from FlowJo, then no data transformations are required, and you can skip all of the arcsinh transformation steps and proceed straight to adding the metadata. More information on the FCS, CSV scale, and CSV channel value file types can be found here.

    setwd(OutputDirectory)
    dir.create("Output 1 - transformed plots")
    setwd("Output 1 - transformed plots")

First, check the column names of the dataset.

    ### Arcsinh transformation

        as.matrix(names(cell.dat))
##       [,1]
##  [1,] "NK11"
##  [2,] "CD3"
##  [3,] "CD45"
##  [4,] "Ly6G"
##  [5,] "CD11b"
##  [6,] "B220"
##  [7,] "CD8a"
##  [8,] "Ly6C"
##  [9,] "CD4"
## [10,] "FileName"
## [11,] "FileNo"

The columns we want to apply arcsinh transformation to are the cellular columns – column 1 to column 9. We can specify those columns using the code below.

    ### Arcsinh transformation

        as.matrix(names(cell.dat))
##       [,1]
##  [1,] "NK11"
##  [2,] "CD3"
##  [3,] "CD45"
##  [4,] "Ly6G"
##  [5,] "CD11b"
##  [6,] "B220"
##  [7,] "CD8a"
##  [8,] "Ly6C"
##  [9,] "CD4"
## [10,] "FileName"
## [11,] "FileNo"
        to.asinh <- names(cell.dat)[c(1:9)]
        to.asinh
## [1] "NK11"  "CD3"   "CD45"  "Ly6G"  "CD11b" "B220"  "CD8a"  "Ly6C"  "CD4"

Define the cofactor we will use for transformation. As a general recommendation, we suggest using cofactor = 15 for CyTOF data, and cofactor between 100 and 1000 for flow data (we suggest 500 as a starting point). Here is a quick comparison figure showing how different co-factors compare to bi-exponential transformations performed on an LSR-II. For more detailed information on this choice, and for approaches where different cofactors for different columns might be required, see this page.


In this worked example we will use a cofactor of 500.

        cofactor <- 500

You can also choose a column to use for plotting the transformed result – ideally something that is expressed on a variety of cell types in your dataset.

        plot.against <- "Ly6C_asinh"

Now we need to apply arcsinh transformation to the data in those columns, using a specific co-factor.

        cell.dat <- do.asinh(cell.dat, to.asinh, cofactor = cofactor)
        transformed.cols <- paste0(to.asinh, "_asinh")

We can then make some plots to see if the arcsinh transformation is appropriate

        for(i in transformed.cols){
          make.colour.plot(do.subsample(cell.dat, 20000), i, plot.against)
        }

Check the plots and see if you are happy with the transformation. For more detailed guidance, see this page. If happy, the proceed with analysis. Otherwise, go back to the merging of the data.list (to create cell.dat) and try with another co-factor.



4. Add metadata


#######################################################################################################
#### 4. Add metadata and set some preferences
#######################################################################################################

We also want to read in and attach some sample metadata, to aid with our analysis.

    ### Add metadata to data.table

        meta.dat
##              Filename     Sample Group Batch Cells per sample
##  1:   CNS_Mock_01.csv 01_Mock_01  Mock     A           420000
##  2:   CNS_Mock_02.csv 02_Mock_02  Mock     B           240000
##  3:   CNS_Mock_03.csv 03_Mock_03  Mock     B           256000
##  4:   CNS_Mock_04.csv 04_Mock_04  Mock     A           252000
##  5:   CNS_Mock_05.csv 05_Mock_05  Mock     A           345000
##  6:   CNS_Mock_06.csv 06_Mock_06  Mock     B           702000
##  7: CNS_WNV_D7_01.csv  07_WNV_01   WNV     A          5070000
##  8: CNS_WNV_D7_02.csv  08_WNV_02   WNV     B          2940000
##  9: CNS_WNV_D7_03.csv  09_WNV_03   WNV     A          2120000
## 10: CNS_WNV_D7_04.csv  10_WNV_04   WNV     A          4320000
## 11: CNS_WNV_D7_05.csv  11_WNV_05   WNV     B          4080000
## 12: CNS_WNV_D7_06.csv  12_WNV_06   WNV     A          1830000
        sample.info <- meta.dat[,c(1:4)]
        sample.info
##              Filename     Sample Group Batch
##  1:   CNS_Mock_01.csv 01_Mock_01  Mock     A
##  2:   CNS_Mock_02.csv 02_Mock_02  Mock     B
##  3:   CNS_Mock_03.csv 03_Mock_03  Mock     B
##  4:   CNS_Mock_04.csv 04_Mock_04  Mock     A
##  5:   CNS_Mock_05.csv 05_Mock_05  Mock     A
##  6:   CNS_Mock_06.csv 06_Mock_06  Mock     B
##  7: CNS_WNV_D7_01.csv  07_WNV_01   WNV     A
##  8: CNS_WNV_D7_02.csv  08_WNV_02   WNV     B
##  9: CNS_WNV_D7_03.csv  09_WNV_03   WNV     A
## 10: CNS_WNV_D7_04.csv  10_WNV_04   WNV     A
## 11: CNS_WNV_D7_05.csv  11_WNV_05   WNV     B
## 12: CNS_WNV_D7_06.csv  12_WNV_06   WNV     A

Once we have the metadata read into R, we will select only the columns we want to add to our dataset. In this example we only want to include use first four columns (Filename, Sample, Group, and Batch). ‘Filename’ will be used to for matching between cell.dat and meta.dat, and the other three columns will be the information that gets added to cell.dat

        meta.dat
##              Filename     Sample Group Batch Cells per sample
##  1:   CNS_Mock_01.csv 01_Mock_01  Mock     A           420000
##  2:   CNS_Mock_02.csv 02_Mock_02  Mock     B           240000
##  3:   CNS_Mock_03.csv 03_Mock_03  Mock     B           256000
##  4:   CNS_Mock_04.csv 04_Mock_04  Mock     A           252000
##  5:   CNS_Mock_05.csv 05_Mock_05  Mock     A           345000
##  6:   CNS_Mock_06.csv 06_Mock_06  Mock     B           702000
##  7: CNS_WNV_D7_01.csv  07_WNV_01   WNV     A          5070000
##  8: CNS_WNV_D7_02.csv  08_WNV_02   WNV     B          2940000
##  9: CNS_WNV_D7_03.csv  09_WNV_03   WNV     A          2120000
## 10: CNS_WNV_D7_04.csv  10_WNV_04   WNV     A          4320000
## 11: CNS_WNV_D7_05.csv  11_WNV_05   WNV     B          4080000
## 12: CNS_WNV_D7_06.csv  12_WNV_06   WNV     A          1830000
        counts <- meta.dat[,c(2,5)]
        counts
##         Sample Cells per sample
##  1: 01_Mock_01           420000
##  2: 02_Mock_02           240000
##  3: 03_Mock_03           256000
##  4: 04_Mock_04           252000
##  5: 05_Mock_05           345000
##  6: 06_Mock_06           702000
##  7:  07_WNV_01          5070000
##  8:  08_WNV_02          2940000
##  9:  09_WNV_03          2120000
## 10:  10_WNV_04          4320000
## 11:  11_WNV_05          4080000
## 12:  12_WNV_06          1830000

Now we can add this information to cell.dat. Essentially, the file names are listed in the metadata table, and we can use that to add any listed metadata in the table to the corresponding files in data.list. We can thrn review the data to ensure the metadata has been correctly embedded.

        cell.dat <- do.add.cols(cell.dat, "FileName", sample.info, "Filename", rmv.ext = TRUE)
        cell.dat
##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##      1:   958.7000  711.0720   CNS_Mock_01      1  0.08464269  0.080111681
##      2:   448.2590  307.2720   CNS_Mock_01      1  0.08581185  0.235835773
##      3:  1008.8300  707.0940   CNS_Mock_01      1  0.11819779  0.401593928
##      4:   440.0710  249.7840   CNS_Mock_01      1  0.67698633 -0.000467756
##      5:    87.4815  867.5700   CNS_Mock_01      1  0.79429776  0.079985087
##     ---
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12  1.36096843  0.145201437
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12 -0.02052696  0.128027364
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12 -0.36070893 -0.018890177
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12  0.47832275  0.445126321
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12  1.18247624  0.189799003
##         CD45_asinh  Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##      1:   3.316967 -0.64409775    4.080349 -0.08039317  0.16666238  1.4060734
##      2:   1.982231 -0.77832998    3.122671  0.17247816  0.06938811  0.8062765
##      3:   3.714001 -1.88215184    4.376885  0.77554551  0.54445897  1.4515055
##      4:   2.709829 -1.26580137    3.639269  0.35719570  0.62559714  0.7940335
##      5:   3.605299  0.08092265    3.136655 -0.41166199  0.29417852  0.1740824
##     ---
## 169000:   4.835271 -0.59701179    4.735145 -0.01559377 -0.51987213  3.8736061
## 169001:   5.197157 -0.93752450    4.510277  0.39450870 -1.38534526  2.8221404
## 169002:   3.858443 -0.19464519    4.233563  0.24450841 -0.42678897  3.5757554
## 169003:   4.861056 -1.11588765    4.344275 -1.08581190 -0.39298254  3.7253778
## 169004:   5.218981 -1.44774229    4.520071 -1.09255123  0.14417124  3.6595440
##           CD4_asinh     Sample Group Batch
##      1:  1.15078593 01_Mock_01  Mock     A
##      2:  0.58125620 01_Mock_01  Mock     A
##      3:  1.14620108 01_Mock_01  Mock     A
##      4:  0.48082540 01_Mock_01  Mock     A
##      5:  1.31850146 01_Mock_01  Mock     A
##     ---
## 169000:  1.53218180  12_WNV_06   WNV     A
## 169001:  0.59596889  12_WNV_06   WNV     A
## 169002: -0.81400762  12_WNV_06   WNV     A
## 169003:  0.12304230  12_WNV_06   WNV     A
## 169004: -0.06366339  12_WNV_06   WNV     A

Check the column names and specify columns that represent cellular features (in this case, the arcsinh transformed data, defined by “markername_asinh”).

    ### Columns

        as.matrix(names(cell.dat))
##       [,1]
##  [1,] "NK11"
##  [2,] "CD3"
##  [3,] "CD45"
##  [4,] "Ly6G"
##  [5,] "CD11b"
##  [6,] "B220"
##  [7,] "CD8a"
##  [8,] "Ly6C"
##  [9,] "CD4"
## [10,] "FileName"
## [11,] "FileNo"
## [12,] "NK11_asinh"
## [13,] "CD3_asinh"
## [14,] "CD45_asinh"
## [15,] "Ly6G_asinh"
## [16,] "CD11b_asinh"
## [17,] "B220_asinh"
## [18,] "CD8a_asinh"
## [19,] "Ly6C_asinh"
## [20,] "CD4_asinh"
## [21,] "Sample"
## [22,] "Group"
## [23,] "Batch"
        cellular.cols <- names(cell.dat)[c(12:20)]
        as.matrix(cellular.cols)
##       [,1]
##  [1,] "NK11_asinh"
##  [2,] "CD3_asinh"
##  [3,] "CD45_asinh"
##  [4,] "Ly6G_asinh"
##  [5,] "CD11b_asinh"
##  [6,] "B220_asinh"
##  [7,] "CD8a_asinh"
##  [8,] "Ly6C_asinh"
##  [9,] "CD4_asinh"

Additionally, specify the columns that will be used to generate cluster and tSNE/UMAP results. Columns that are not specified here will still be analysed, but won’t contributed to the generation of clusters. There are a couple of strategies to take here: use all cellular columns for clustering to looks for all possible cell types/states, or use only stably expressed markers to cluster stable phenotypes, which can then be examined for changes in more dynamic markers. For more guidance, see our tutorials page.

        as.matrix(names(cell.dat))
##       [,1]
##  [1,] "NK11"
##  [2,] "CD3"
##  [3,] "CD45"
##  [4,] "Ly6G"
##  [5,] "CD11b"
##  [6,] "B220"
##  [7,] "CD8a"
##  [8,] "Ly6C"
##  [9,] "CD4"
## [10,] "FileName"
## [11,] "FileNo"
## [12,] "NK11_asinh"
## [13,] "CD3_asinh"
## [14,] "CD45_asinh"
## [15,] "Ly6G_asinh"
## [16,] "CD11b_asinh"
## [17,] "B220_asinh"
## [18,] "CD8a_asinh"
## [19,] "Ly6C_asinh"
## [20,] "CD4_asinh"
## [21,] "Sample"
## [22,] "Group"
## [23,] "Batch"
        cluster.cols <- names(cell.dat)[c(12:20)]
        as.matrix(cluster.cols)
##       [,1]
##  [1,] "NK11_asinh"
##  [2,] "CD3_asinh"
##  [3,] "CD45_asinh"
##  [4,] "Ly6G_asinh"
##  [5,] "CD11b_asinh"
##  [6,] "B220_asinh"
##  [7,] "CD8a_asinh"
##  [8,] "Ly6C_asinh"
##  [9,] "CD4_asinh"

Specify sample, group, and batch columns.

        exp.name <- "CNS experiment"
        sample.col <- "Sample"
        group.col <- "Group"
        batch.col <- "Batch"

Additionally, we want to specify the downsample targets for dimensionality reduction. This influences how many cells will be shown on a tSNE/UMAP plot, and we are specifying the number of cells per group to downsample to. Check for the number of cells (rows) in each group:

    ### Subsample targets per group

        data.frame(table(cell.dat[[group.col]])) # Check number of cells per sample.
##   Var1   Freq
## 1 Mock  66992
## 2  WNV 102012

You can then specify the number to downsample to in each group. These must be lower than the total number of cells in each group. In this example we want 2000 cells from ‘mock’ and 20,000 cells from ‘WNV’, to reflect the number of cells present in each group.

First, check the order that the groups appear in the dataset.

        unique(cell.dat[[group.col]])
## [1] "Mock" "WNV"

Now you can specify the targets (in the order of unique(cell.dat[[group.col]]) above).

        sub.targets <- c(2000, 20000) # target subsample numbers from each group
        sub.targets
## [1]  2000 20000



5. Clustering and DR


#######################################################################################################
#### 5. Clustering and dimensionality reduction
#######################################################################################################
    setwd(OutputDirectory)
    dir.create("Output 2 - clustering")
    setwd("Output 2 - clustering")

We can run clustering using the run.flowsom function. In this case we can define the number of desired metaclusters manually, with the meta.k argument (in this case we have chosen 8). This can be increased or decreased as required. Typically, overclustering is preferred, as multiple clusters that represent a single cellular population can always be annotated as such. Subsequently, we can write the clustered dataset to disk.

    ### Clustering

        cell.dat <- run.flowsom(cell.dat, cluster.cols, meta.k = 8)
        cell.dat
##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##      1:   958.7000  711.0720   CNS_Mock_01      1  0.08464269  0.080111681
##      2:   448.2590  307.2720   CNS_Mock_01      1  0.08581185  0.235835773
##      3:  1008.8300  707.0940   CNS_Mock_01      1  0.11819779  0.401593928
##      4:   440.0710  249.7840   CNS_Mock_01      1  0.67698633 -0.000467756
##      5:    87.4815  867.5700   CNS_Mock_01      1  0.79429776  0.079985087
##     ---
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12  1.36096843  0.145201437
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12 -0.02052696  0.128027364
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12 -0.36070893 -0.018890177
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12  0.47832275  0.445126321
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12  1.18247624  0.189799003
##         CD45_asinh  Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##      1:   3.316967 -0.64409775    4.080349 -0.08039317  0.16666238  1.4060734
##      2:   1.982231 -0.77832998    3.122671  0.17247816  0.06938811  0.8062765
##      3:   3.714001 -1.88215184    4.376885  0.77554551  0.54445897  1.4515055
##      4:   2.709829 -1.26580137    3.639269  0.35719570  0.62559714  0.7940335
##      5:   3.605299  0.08092265    3.136655 -0.41166199  0.29417852  0.1740824
##     ---
## 169000:   4.835271 -0.59701179    4.735145 -0.01559377 -0.51987213  3.8736061
## 169001:   5.197157 -0.93752450    4.510277  0.39450870 -1.38534526  2.8221404
## 169002:   3.858443 -0.19464519    4.233563  0.24450841 -0.42678897  3.5757554
## 169003:   4.861056 -1.11588765    4.344275 -1.08581190 -0.39298254  3.7253778
## 169004:   5.218981 -1.44774229    4.520071 -1.09255123  0.14417124  3.6595440
##           CD4_asinh     Sample Group Batch FlowSOM_cluster FlowSOM_metacluster
##      1:  1.15078593 01_Mock_01  Mock     A              74                   5
##      2:  0.58125620 01_Mock_01  Mock     A             114                   5
##      3:  1.14620108 01_Mock_01  Mock     A              90                   5
##      4:  0.48082540 01_Mock_01  Mock     A             102                   5
##      5:  1.31850146 01_Mock_01  Mock     A             173                   5
##     ---
## 169000:  1.53218180  12_WNV_06   WNV     A             125                   4
## 169001:  0.59596889  12_WNV_06   WNV     A             123                   4
## 169002: -0.81400762  12_WNV_06   WNV     A             165                   4
## 169003:  0.12304230  12_WNV_06   WNV     A              83                   4
## 169004: -0.06366339  12_WNV_06   WNV     A              83                   4

We can then run dimensionality reduction on a subset of the data, allow us to visualise the data and resulting clusters. In this case we have used run.umap, though other options are available, including run.fitsne and run.tsne. As before, this subsampled dataset with DR coordinates is saved to disk.

    ### Dimensionality reduction

        cell.sub <- do.subsample(cell.dat, sub.targets, group.col)
        cell.sub
##             NK11       CD3     CD45      Ly6G    CD11b      B220      CD8a
##     1:  106.0150 235.13900 13574.70  -179.318 31385.80 -169.4210 -394.6730
##     2:  132.7940  16.52790  9533.69  -673.468 22061.20  184.8820  282.5120
##     3:  236.6740 143.33000  7805.94  -605.170 11658.20  658.9580  107.7280
##     4:   14.8980 -10.63350  3628.48   468.071 12264.60 -178.1110 -194.3480
##     5:  194.5450   3.62867  3192.66  -515.083 11939.20  -33.5803   42.3089
##    ---
## 21996:  -16.4986  11.00010 24706.30  -263.460  9151.59 -144.2120 -499.9370
## 21997:  119.8740  92.33010 43645.10 -1130.550 38278.10  -42.0887 -445.5530
## 21998: 2884.3600  88.01560 88998.60 -4851.610  4889.69 4999.7900  -58.7703
## 21999:  560.6400 151.51900 55440.80 -2633.550 23937.20  -95.7631  656.6650
## 22000:  118.9280 331.26200 52654.70 -3161.200 25113.80 -938.5550  155.8260
##             Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##     1:   380.647  509.4590   CNS_Mock_05      5  0.21047261  0.454470942
##     2:   608.245 1095.4700   CNS_Mock_04      4  0.26256084  0.033049783
##     3:   805.445 1937.4200   CNS_Mock_04      4  0.45724742  0.282872452
##     4:   126.416  227.9000   CNS_Mock_04      4  0.02979159 -0.021265397
##     5:   666.571  358.8580   CNS_Mock_02      2  0.37988669  0.007257276
##    ---
## 21996: 22334.600   59.6839 CNS_WNV_D7_04     10 -0.03299121  0.021998426
## 21997:  4490.860  727.2510 CNS_WNV_D7_06     12  0.23750870  0.183626518
## 21998:  1175.860  273.2380 CNS_WNV_D7_01      7  2.45302657  0.175134535
## 21999: 13667.400  334.3160 CNS_WNV_D7_04     10  0.96458591  0.298581704
## 22000: 23457.800 2331.7400 CNS_WNV_D7_01      7  0.23566844  0.621694916
##        CD45_asinh Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##     1:   3.994841 -0.3513617    4.832713 -0.33267180 -0.72432725  0.7021579
##     2:   3.641813 -1.1067473    4.480243  0.36181778  0.53860285  1.0264871
##     3:   3.442203 -1.0225758    3.842764  1.08932754  0.21382293  1.2547409
##     4:   2.679822  0.8354908    3.893427 -0.34908852 -0.37951948  0.2502130
##     5:   2.553225 -0.9025441    3.866560 -0.06711021  0.08451714  1.0984975
##    ---
## 21996:   4.593455 -0.5051593    3.600967 -0.28456776 -0.88128449  4.4925567
## 21997:   5.162418 -1.5546568    5.031215 -0.08407830 -0.80224144  2.8914233
## 21998:   5.874923 -2.9682497    2.976027  2.99818116 -0.11727162  1.5907079
## 21999:   5.401630 -2.3635192    4.561837 -0.19037418  1.08655240  4.0016422
## 22000:   5.350072 -2.5434425    4.609811 -1.38728687  0.30681557  4.5416110
##        CD4_asinh     Sample Group Batch FlowSOM_cluster FlowSOM_metacluster
##     1: 0.8946876 05_Mock_05  Mock     A             133                   5
##     2: 1.5259050 04_Mock_04  Mock     A             146                   5
##     3: 2.0639011 04_Mock_04  Mock     A              76                   5
##     4: 0.4413331 04_Mock_04  Mock     A             169                   5
##     5: 0.6671197 02_Mock_02  Mock     B             118                   5
##    ---
## 21996: 0.1190861  10_WNV_04   WNV     A             151                   4
## 21997: 1.1692576  12_WNV_06   WNV     A             123                   4
## 21998: 0.5223904  07_WNV_01   WNV     A              29                   1
## 21999: 0.6267796  10_WNV_04   WNV     A              94                   4
## 22000: 2.2442111  07_WNV_01   WNV     A               9                   4
        cell.sub <- run.umap(cell.sub, cluster.cols)
        cell.sub
##             NK11       CD3     CD45      Ly6G    CD11b      B220      CD8a
##     1:  106.0150 235.13900 13574.70  -179.318 31385.80 -169.4210 -394.6730
##     2:  132.7940  16.52790  9533.69  -673.468 22061.20  184.8820  282.5120
##     3:  236.6740 143.33000  7805.94  -605.170 11658.20  658.9580  107.7280
##     4:   14.8980 -10.63350  3628.48   468.071 12264.60 -178.1110 -194.3480
##     5:  194.5450   3.62867  3192.66  -515.083 11939.20  -33.5803   42.3089
##    ---
## 21996:  -16.4986  11.00010 24706.30  -263.460  9151.59 -144.2120 -499.9370
## 21997:  119.8740  92.33010 43645.10 -1130.550 38278.10  -42.0887 -445.5530
## 21998: 2884.3600  88.01560 88998.60 -4851.610  4889.69 4999.7900  -58.7703
## 21999:  560.6400 151.51900 55440.80 -2633.550 23937.20  -95.7631  656.6650
## 22000:  118.9280 331.26200 52654.70 -3161.200 25113.80 -938.5550  155.8260
##             Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##     1:   380.647  509.4590   CNS_Mock_05      5  0.21047261  0.454470942
##     2:   608.245 1095.4700   CNS_Mock_04      4  0.26256084  0.033049783
##     3:   805.445 1937.4200   CNS_Mock_04      4  0.45724742  0.282872452
##     4:   126.416  227.9000   CNS_Mock_04      4  0.02979159 -0.021265397
##     5:   666.571  358.8580   CNS_Mock_02      2  0.37988669  0.007257276
##    ---
## 21996: 22334.600   59.6839 CNS_WNV_D7_04     10 -0.03299121  0.021998426
## 21997:  4490.860  727.2510 CNS_WNV_D7_06     12  0.23750870  0.183626518
## 21998:  1175.860  273.2380 CNS_WNV_D7_01      7  2.45302657  0.175134535
## 21999: 13667.400  334.3160 CNS_WNV_D7_04     10  0.96458591  0.298581704
## 22000: 23457.800 2331.7400 CNS_WNV_D7_01      7  0.23566844  0.621694916
##        CD45_asinh Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##     1:   3.994841 -0.3513617    4.832713 -0.33267180 -0.72432725  0.7021579
##     2:   3.641813 -1.1067473    4.480243  0.36181778  0.53860285  1.0264871
##     3:   3.442203 -1.0225758    3.842764  1.08932754  0.21382293  1.2547409
##     4:   2.679822  0.8354908    3.893427 -0.34908852 -0.37951948  0.2502130
##     5:   2.553225 -0.9025441    3.866560 -0.06711021  0.08451714  1.0984975
##    ---
## 21996:   4.593455 -0.5051593    3.600967 -0.28456776 -0.88128449  4.4925567
## 21997:   5.162418 -1.5546568    5.031215 -0.08407830 -0.80224144  2.8914233
## 21998:   5.874923 -2.9682497    2.976027  2.99818116 -0.11727162  1.5907079
## 21999:   5.401630 -2.3635192    4.561837 -0.19037418  1.08655240  4.0016422
## 22000:   5.350072 -2.5434425    4.609811 -1.38728687  0.30681557  4.5416110
##        CD4_asinh     Sample Group Batch FlowSOM_cluster FlowSOM_metacluster
##     1: 0.8946876 05_Mock_05  Mock     A             133                   5
##     2: 1.5259050 04_Mock_04  Mock     A             146                   5
##     3: 2.0639011 04_Mock_04  Mock     A              76                   5
##     4: 0.4413331 04_Mock_04  Mock     A             169                   5
##     5: 0.6671197 02_Mock_02  Mock     B             118                   5
##    ---
## 21996: 0.1190861  10_WNV_04   WNV     A             151                   4
## 21997: 1.1692576  12_WNV_06   WNV     A             123                   4
## 21998: 0.5223904  07_WNV_01   WNV     A              29                   1
## 21999: 0.6267796  10_WNV_04   WNV     A              94                   4
## 22000: 2.2442111  07_WNV_01   WNV     A               9                   4
##             UMAP_X     UMAP_Y
##     1:  -2.6476458 -2.2792177
##     2:  -1.9378009 -3.0190689
##     3:  -2.1449110 -3.0987564
##     4:  -4.3262397 -2.0804274
##     5:  -3.3093684 -3.2406053
##    ---
## 21996:   0.1274319  2.3281831
## 21997:   2.1111617  2.4060408
## 21998: -10.4458815 -7.2565838
## 21999:   4.9641307  0.4798327
## 22000:   6.4724930  2.3447172

We can visualise the DR data to asses which clusters represent cellular populations.

    ### DR plots

        make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", col.type = 'factor', add.label = TRUE)

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", cellular.cols)

We can also generate some multi plots to compare between experimental groups or batches.

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", group.col, col.type = 'factor')

We can also produce expression heatmaps to help guide our interpretation of cluster identities.

    ### Expression heatmap

        exp <- do.aggregate(cell.dat, cellular.cols, by = "FlowSOM_metacluster")
        make.pheatmap(exp, "FlowSOM_metacluster", cellular.cols)



6. Annotate clusters


#######################################################################################################
#### 6. Annotate clusters
#######################################################################################################

Review the cluster labels and marker expression patterns, so you can annotate the clusters. This annotation is optional, as all subsequent steps can be performed on the ‘clusters’ instead of the ‘populations’. Here we can create a list of population names, and then specify which clusters make up that population (e.g. CD4 T cells are contained within cluster ‘3’).

    setwd(OutputDirectory)
    dir.create("Output 3 - annotation")
    setwd("Output 3 - annotation")
    ### Annotate

        annots <- list("CD4 T cells" = c(3),
                       "CD8 T cells" = c(2),
                       "NK cells" = c(1),
                       "Neutrophils" = c(8),
                       "Infil Macrophages" = c(4),
                       "Microglia" = c(5,6,7)
        )

Once the annotation list is created, we can switch the list into a table format to annotate our data.

        annots <- do.list.switch(annots)
        names(annots) <- c("Values", "Population")
        setorderv(annots, 'Values')
        annots
##    Values        Population
## 1:      1          NK cells
## 2:      2       CD8 T cells
## 3:      3       CD4 T cells
## 4:      4 Infil Macrophages
## 5:      5         Microglia
## 6:      6         Microglia
## 7:      7         Microglia
## 8:      8       Neutrophils

Using the do.add.cols function, we can add the population names to the corresponding clusters.

    ### Add annotations

        cell.dat <- do.add.cols(cell.dat, "FlowSOM_metacluster", annots, "Values")
        cell.dat
##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##      1:   958.7000  711.0720   CNS_Mock_01      1  0.08464269  0.080111681
##      2:   448.2590  307.2720   CNS_Mock_01      1  0.08581185  0.235835773
##      3:  1008.8300  707.0940   CNS_Mock_01      1  0.11819779  0.401593928
##      4:   440.0710  249.7840   CNS_Mock_01      1  0.67698633 -0.000467756
##      5:    87.4815  867.5700   CNS_Mock_01      1  0.79429776  0.079985087
##     ---
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12  1.36096843  0.145201437
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12 -0.02052696  0.128027364
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12 -0.36070893 -0.018890177
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12  0.47832275  0.445126321
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12  1.18247624  0.189799003
##         CD45_asinh  Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##      1:   3.316967 -0.64409775    4.080349 -0.08039317  0.16666238  1.4060734
##      2:   1.982231 -0.77832998    3.122671  0.17247816  0.06938811  0.8062765
##      3:   3.714001 -1.88215184    4.376885  0.77554551  0.54445897  1.4515055
##      4:   2.709829 -1.26580137    3.639269  0.35719570  0.62559714  0.7940335
##      5:   3.605299  0.08092265    3.136655 -0.41166199  0.29417852  0.1740824
##     ---
## 169000:   4.835271 -0.59701179    4.735145 -0.01559377 -0.51987213  3.8736061
## 169001:   5.197157 -0.93752450    4.510277  0.39450870 -1.38534526  2.8221404
## 169002:   3.858443 -0.19464519    4.233563  0.24450841 -0.42678897  3.5757554
## 169003:   4.861056 -1.11588765    4.344275 -1.08581190 -0.39298254  3.7253778
## 169004:   5.218981 -1.44774229    4.520071 -1.09255123  0.14417124  3.6595440
##           CD4_asinh     Sample Group Batch FlowSOM_cluster FlowSOM_metacluster
##      1:  1.15078593 01_Mock_01  Mock     A              74                   5
##      2:  0.58125620 01_Mock_01  Mock     A             114                   5
##      3:  1.14620108 01_Mock_01  Mock     A              90                   5
##      4:  0.48082540 01_Mock_01  Mock     A             102                   5
##      5:  1.31850146 01_Mock_01  Mock     A             173                   5
##     ---
## 169000:  1.53218180  12_WNV_06   WNV     A             125                   4
## 169001:  0.59596889  12_WNV_06   WNV     A             123                   4
## 169002: -0.81400762  12_WNV_06   WNV     A             165                   4
## 169003:  0.12304230  12_WNV_06   WNV     A              83                   4
## 169004: -0.06366339  12_WNV_06   WNV     A              83                   4
##                Population
##      1:         Microglia
##      2:         Microglia
##      3:         Microglia
##      4:         Microglia
##      5:         Microglia
##     ---
## 169000: Infil Macrophages
## 169001: Infil Macrophages
## 169002: Infil Macrophages
## 169003: Infil Macrophages
## 169004: Infil Macrophages
        cell.sub <- do.add.cols(cell.sub, "FlowSOM_metacluster", annots, "Values")
        cell.sub
##             NK11       CD3     CD45      Ly6G    CD11b      B220      CD8a
##     1:  106.0150 235.13900 13574.70  -179.318 31385.80 -169.4210 -394.6730
##     2:  132.7940  16.52790  9533.69  -673.468 22061.20  184.8820  282.5120
##     3:  236.6740 143.33000  7805.94  -605.170 11658.20  658.9580  107.7280
##     4:   14.8980 -10.63350  3628.48   468.071 12264.60 -178.1110 -194.3480
##     5:  194.5450   3.62867  3192.66  -515.083 11939.20  -33.5803   42.3089
##    ---
## 21996:  -16.4986  11.00010 24706.30  -263.460  9151.59 -144.2120 -499.9370
## 21997:  119.8740  92.33010 43645.10 -1130.550 38278.10  -42.0887 -445.5530
## 21998: 2884.3600  88.01560 88998.60 -4851.610  4889.69 4999.7900  -58.7703
## 21999:  560.6400 151.51900 55440.80 -2633.550 23937.20  -95.7631  656.6650
## 22000:  118.9280 331.26200 52654.70 -3161.200 25113.80 -938.5550  155.8260
##             Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##     1:   380.647  509.4590   CNS_Mock_05      5  0.21047261  0.454470942
##     2:   608.245 1095.4700   CNS_Mock_04      4  0.26256084  0.033049783
##     3:   805.445 1937.4200   CNS_Mock_04      4  0.45724742  0.282872452
##     4:   126.416  227.9000   CNS_Mock_04      4  0.02979159 -0.021265397
##     5:   666.571  358.8580   CNS_Mock_02      2  0.37988669  0.007257276
##    ---
## 21996: 22334.600   59.6839 CNS_WNV_D7_04     10 -0.03299121  0.021998426
## 21997:  4490.860  727.2510 CNS_WNV_D7_06     12  0.23750870  0.183626518
## 21998:  1175.860  273.2380 CNS_WNV_D7_01      7  2.45302657  0.175134535
## 21999: 13667.400  334.3160 CNS_WNV_D7_04     10  0.96458591  0.298581704
## 22000: 23457.800 2331.7400 CNS_WNV_D7_01      7  0.23566844  0.621694916
##        CD45_asinh Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##     1:   3.994841 -0.3513617    4.832713 -0.33267180 -0.72432725  0.7021579
##     2:   3.641813 -1.1067473    4.480243  0.36181778  0.53860285  1.0264871
##     3:   3.442203 -1.0225758    3.842764  1.08932754  0.21382293  1.2547409
##     4:   2.679822  0.8354908    3.893427 -0.34908852 -0.37951948  0.2502130
##     5:   2.553225 -0.9025441    3.866560 -0.06711021  0.08451714  1.0984975
##    ---
## 21996:   4.593455 -0.5051593    3.600967 -0.28456776 -0.88128449  4.4925567
## 21997:   5.162418 -1.5546568    5.031215 -0.08407830 -0.80224144  2.8914233
## 21998:   5.874923 -2.9682497    2.976027  2.99818116 -0.11727162  1.5907079
## 21999:   5.401630 -2.3635192    4.561837 -0.19037418  1.08655240  4.0016422
## 22000:   5.350072 -2.5434425    4.609811 -1.38728687  0.30681557  4.5416110
##        CD4_asinh     Sample Group Batch FlowSOM_cluster FlowSOM_metacluster
##     1: 0.8946876 05_Mock_05  Mock     A             133                   5
##     2: 1.5259050 04_Mock_04  Mock     A             146                   5
##     3: 2.0639011 04_Mock_04  Mock     A              76                   5
##     4: 0.4413331 04_Mock_04  Mock     A             169                   5
##     5: 0.6671197 02_Mock_02  Mock     B             118                   5
##    ---
## 21996: 0.1190861  10_WNV_04   WNV     A             151                   4
## 21997: 1.1692576  12_WNV_06   WNV     A             123                   4
## 21998: 0.5223904  07_WNV_01   WNV     A              29                   1
## 21999: 0.6267796  10_WNV_04   WNV     A              94                   4
## 22000: 2.2442111  07_WNV_01   WNV     A               9                   4
##             UMAP_X     UMAP_Y        Population
##     1:  -2.6476458 -2.2792177         Microglia
##     2:  -1.9378009 -3.0190689         Microglia
##     3:  -2.1449110 -3.0987564         Microglia
##     4:  -4.3262397 -2.0804274         Microglia
##     5:  -3.3093684 -3.2406053         Microglia
##    ---
## 21996:   0.1274319  2.3281831 Infil Macrophages
## 21997:   2.1111617  2.4060408 Infil Macrophages
## 21998: -10.4458815 -7.2565838          NK cells
## 21999:   4.9641307  0.4798327 Infil Macrophages
## 22000:   6.4724930  2.3447172 Infil Macrophages

Subsequently, we can visualise the population labels on a UMAP plot.

        make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", col.type = 'factor', add.label = TRUE)

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", group.col, col.type = 'factor')

We can also generate an expression heatmap to summarise the expression levels of each marker on our populations.

    ### Expression heatmap

        rm(exp)
        exp <- do.aggregate(cell.dat, cellular.cols, by = "Population")
        make.pheatmap(exp, "Population", cellular.cols)

    ### Write data

        setwd(OutputDirectory)
        setwd("Output 3 - annotation")

        fwrite(cell.dat, "Annotated.data.csv")
        fwrite(cell.sub, "Annotated.data.DR.csv")
    ### Write FCS files

        setwd(OutputDirectory)
        setwd("Output 3 - annotation")
        dir.create('FCS files - all')
        setwd('FCS files - all')

        write.files(cell.dat,
                    file.prefix = exp.name,
                    divide.by = sample.col,
                    write.csv = FALSE,
                    write.fcs = TRUE)
        setwd(OutputDirectory)
        setwd("Output 3 - annotation")
        dir.create('FCS files - DR')
        setwd('FCS files - DR')

        write.files(cell.sub,
                    file.prefix = paste0('DR-', exp.name),
                    divide.by = sample.col,
                    write.csv = FALSE,
                    write.fcs = TRUE)



7. Summary data and stats


#######################################################################################################
#### 7. Summary data and statistical analysis
#######################################################################################################

Here we can create ‘summary’ data for our experiment. This involves calculating the percentage of each population in each sample, along with the corresponding cell counts if the information is available. In addition, we calculate the MFI for selected markers on each population in each sample.

First, set the working directory, and select which columns we will measure the MFI of. In this case, CD11b_asinh and Ly6C_asinh.

    setwd(OutputDirectory)
    dir.create("Output 4 - summary data")
    setwd("Output 4 - summary data")
    ### Setup

        variance.test <- 'kruskal.test'
        pairwise.test <- "wilcox.test"

        as.matrix(unique(cell.dat[[group.col]]))
##      [,1]
## [1,] "Mock"
## [2,] "WNV"
        comparisons <- list(c("Mock", "WNV"))
        comparisons
## [[1]]
## [1] "Mock" "WNV"
        grp.order <- c("Mock", "WNV")
        grp.order
## [1] "Mock" "WNV"

We can also specify which columns we wish to measure MFI levels on.

    ### Select columns to measure MFI

        as.matrix(cellular.cols)
##       [,1]
##  [1,] "NK11_asinh"
##  [2,] "CD3_asinh"
##  [3,] "CD45_asinh"
##  [4,] "Ly6G_asinh"
##  [5,] "CD11b_asinh"
##  [6,] "B220_asinh"
##  [7,] "CD8a_asinh"
##  [8,] "Ly6C_asinh"
##  [9,] "CD4_asinh"
        dyn.cols <- cellular.cols[c(5,8)]
        dyn.cols
## [1] "CD11b_asinh" "Ly6C_asinh"

Create summary data

Use the new create.sumtable function to generate summary data – a data.table of samples (rows) vs measurements (columns).

    ### Create summary tables

        sum.dat <- create.sumtable(dat = cell.dat,
                                   sample.col = sample.col,
                                   pop.col = "Population",
                                   use.cols = dyn.cols,
                                   annot.cols = c(group.col, batch.col),
                                   counts = counts)

Once the summary data has been generated, we can review it and select which columns to plot. In each case, the column names (i.e. name of each summary measure) are structured as ‘MEASURE TYPE – POPULATION’. This provides a useful structure, as we can use regular expression searches to split the name into just the MEASURE TYPE or POPULATION segment.

    ### Review summary data

        sum.dat
##         Sample Group Batch Percent of sample -- CD4 T cells
##  1: 01_Mock_01  Mock     A                        0.7245648
##  2: 02_Mock_02  Mock     B                        0.5124878
##  3: 03_Mock_03  Mock     B                        0.5475221
##  4: 04_Mock_04  Mock     A                        0.2170852
##  5: 05_Mock_05  Mock     A                        0.1831837
##  6: 06_Mock_06  Mock     B                        0.3680229
##  7:  07_WNV_01   WNV     A                        5.4317549
##  8:  08_WNV_02   WNV     B                        2.0575371
##  9:  09_WNV_03   WNV     A                        2.6439152
## 10:  10_WNV_04   WNV     A                        2.6560037
## 11:  11_WNV_05   WNV     B                        3.1457993
## 12:  12_WNV_06   WNV     A                        3.4038246
##     Percent of sample -- CD8 T cells Percent of sample -- Infil Macrophages
##  1:                        2.6366106                               2.535977
##  2:                        1.0249757                               2.886799
##  3:                        0.9055173                               2.709533
##  4:                        0.4165689                               1.942032
##  5:                        0.5129145                               3.059168
##  6:                        1.3698630                               6.276835
##  7:                        5.0362117                              62.791086
##  8:                        2.4271546                              70.584612
##  9:                        3.5752235                              70.654416
## 10:                        3.2689277                              67.357422
## 11:                        3.6983549                              66.375738
## 12:                        3.8619294                              68.028552
##     Percent of sample -- Microglia Percent of sample -- Neutrophils
##  1:                       89.85609                         2.787562
##  2:                       90.96335                         2.802465
##  3:                       90.74828                         3.130703
##  4:                       93.99202                         2.217789
##  5:                       91.59187                         3.278989
##  6:                       81.02637                         7.605807
##  7:                       11.70474                         2.295265
##  8:                       11.26717                         4.841989
##  9:                       10.95693                         3.850241
## 10:                       10.78746                         8.043897
## 11:                       11.33367                         1.632551
## 12:                       12.33687                         1.869706
##     Percent of sample -- NK cells Cells per sample -- CD4 T cells
##  1:                      1.459193                       3043.1720
##  2:                      1.809925                       1229.9708
##  3:                      1.958444                       1401.6566
##  4:                      1.214504                        547.0547
##  5:                      1.373878                        631.9839
##  6:                      3.353098                       2583.5208
##  7:                     12.740947                     275389.9721
##  8:                      8.821536                      60491.5912
##  9:                      8.319270                      56051.0032
## 10:                      7.886288                     114739.3614
## 11:                     13.813889                     128348.6123
## 12:                     10.499121                      62289.9909
##     Cells per sample -- CD8 T cells Cells per sample -- Infil Macrophages
##  1:                       11073.765                             10651.102
##  2:                        2459.942                              6928.317
##  3:                        2318.124                              6936.403
##  4:                        1049.754                              4893.922
##  5:                        1769.555                             10554.131
##  6:                        9616.438                             44063.382
##  7:                      255335.933                           3183508.078
##  8:                       71358.344                           2075187.581
##  9:                       75794.737                           1497873.617
## 10:                      141217.676                           2909840.640
## 11:                      150892.880                           2708130.102
## 12:                       70673.307                           1244922.495
##     Cells per sample -- Microglia Cells per sample -- Neutrophils
##  1:                      377395.6                       11707.759
##  2:                      218312.0                        6725.916
##  3:                      232315.6                        8014.601
##  4:                      236859.9                        5588.829
##  5:                      315991.9                       11312.511
##  6:                      568805.2                       53392.762
##  7:                      593430.1                      116369.916
##  8:                      331254.9                      142354.463
##  9:                      232287.0                       81625.102
## 10:                      466018.3                      347496.352
## 11:                      462413.7                       66608.062
## 12:                      225764.7                       34215.629
##     Cells per sample -- NK cells MFI of CD11b_asinh -- CD4 T cells
##  1:                     6128.610                         1.2414725
##  2:                     4343.821                         1.0786205
##  3:                     5013.618                         1.1676257
##  4:                     3060.549                         1.1347100
##  5:                     4739.879                         0.9687439
##  6:                    23538.745                         0.9862779
##  7:                   645966.017                         2.2005285
##  8:                   259353.169                         2.1292244
##  9:                   176368.523                         2.1904049
## 10:                   340687.642                         2.0086132
## 11:                   563606.681                         2.1888746
## 12:                   192133.916                         2.1314392
##     MFI of CD11b_asinh -- CD8 T cells MFI of CD11b_asinh -- Infil Macrophages
##  1:                         1.2334297                                3.642553
##  2:                         1.2417039                                3.844048
##  3:                         1.0558547                                3.932349
##  4:                         1.0876666                                3.861677
##  5:                         0.7508206                                3.687069
##  6:                         0.9401635                                3.445202
##  7:                         2.2189689                                4.550209
##  8:                         1.9571303                                4.470230
##  9:                         2.2052763                                4.656109
## 10:                         1.9871644                                4.417568
## 11:                         2.1533487                                4.826636
## 12:                         2.2124538                                4.672961
##     MFI of CD11b_asinh -- Microglia MFI of CD11b_asinh -- Neutrophils
##  1:                        3.706870                          4.436716
##  2:                        3.690308                          4.380407
##  3:                        3.733263                          4.491744
##  4:                        3.790280                          4.309418
##  5:                        3.751274                          4.384315
##  6:                        3.742470                          4.546994
##  7:                        3.834684                          5.650920
##  8:                        3.808416                          5.787374
##  9:                        3.977008                          5.651509
## 10:                        3.890926                          5.819943
## 11:                        3.860424                          5.493182
## 12:                        3.949185                          5.416691
##     MFI of CD11b_asinh -- NK cells MFI of Ly6C_asinh -- CD4 T cells
##  1:                      0.8791643                        1.2480876
##  2:                      1.1431743                        1.3403367
##  3:                      1.2581158                        1.1142238
##  4:                      1.0704986                        0.8223323
##  5:                      0.9694068                        1.8349176
##  6:                      0.6555243                        1.0034167
##  7:                      2.9429910                        3.1288202
##  8:                      2.9621273                        2.6370669
##  9:                      2.8882869                        2.3947988
## 10:                      2.8534774                        2.6634397
## 11:                      2.8121396                        2.7549118
## 12:                      2.7216275                        2.2445633
##     MFI of Ly6C_asinh -- CD8 T cells MFI of Ly6C_asinh -- Infil Macrophages
##  1:                         4.231691                               4.412827
##  2:                         1.541403                               4.318295
##  3:                         2.590558                               4.501798
##  4:                         2.297638                               4.165851
##  5:                         4.131631                               4.297579
##  6:                         3.543229                               4.730798
##  7:                         3.222001                               4.677897
##  8:                         2.668671                               4.207133
##  9:                         2.533847                               3.958164
## 10:                         2.671703                               4.367023
## 11:                         2.913979                               4.182381
## 12:                         2.378752                               3.877802
##     MFI of Ly6C_asinh -- Microglia MFI of Ly6C_asinh -- Neutrophils
##  1:                      0.8952408                         3.962119
##  2:                      0.8590031                         4.113934
##  3:                      0.9043240                         4.147993
##  4:                      0.8586159                         4.030143
##  5:                      0.8540362                         4.111129
##  6:                      0.9350437                         4.028560
##  7:                      1.6422281                         3.236892
##  8:                      1.4882650                         2.863273
##  9:                      1.4450478                         2.489240
## 10:                      1.5985993                         2.987165
## 11:                      1.3868439                         2.808331
## 12:                      1.3959113                         2.302332
##     MFI of Ly6C_asinh -- NK cells
##  1:                     0.4097459
##  2:                     0.4015625
##  3:                     0.3663482
##  4:                     0.2905718
##  5:                     0.2930227
##  6:                     0.2847913
##  7:                     1.5759020
##  8:                     1.2813973
##  9:                     1.0971466
## 10:                     1.3549706
## 11:                     1.1439783
## 12:                     1.0962885
        as.matrix(names(sum.dat))
##       [,1]
##  [1,] "Sample"
##  [2,] "Group"
##  [3,] "Batch"
##  [4,] "Percent of sample -- CD4 T cells"
##  [5,] "Percent of sample -- CD8 T cells"
##  [6,] "Percent of sample -- Infil Macrophages"
##  [7,] "Percent of sample -- Microglia"
##  [8,] "Percent of sample -- Neutrophils"
##  [9,] "Percent of sample -- NK cells"
## [10,] "Cells per sample -- CD4 T cells"
## [11,] "Cells per sample -- CD8 T cells"
## [12,] "Cells per sample -- Infil Macrophages"
## [13,] "Cells per sample -- Microglia"
## [14,] "Cells per sample -- Neutrophils"
## [15,] "Cells per sample -- NK cells"
## [16,] "MFI of CD11b_asinh -- CD4 T cells"
## [17,] "MFI of CD11b_asinh -- CD8 T cells"
## [18,] "MFI of CD11b_asinh -- Infil Macrophages"
## [19,] "MFI of CD11b_asinh -- Microglia"
## [20,] "MFI of CD11b_asinh -- Neutrophils"
## [21,] "MFI of CD11b_asinh -- NK cells"
## [22,] "MFI of Ly6C_asinh -- CD4 T cells"
## [23,] "MFI of Ly6C_asinh -- CD8 T cells"
## [24,] "MFI of Ly6C_asinh -- Infil Macrophages"
## [25,] "MFI of Ly6C_asinh -- Microglia"
## [26,] "MFI of Ly6C_asinh -- Neutrophils"
## [27,] "MFI of Ly6C_asinh -- NK cells"
        annot.cols <- c(group.col, batch.col)

Specify which columns we want to plot.

        plot.cols <- names(sum.dat)[c(4:27)]
        plot.cols
##  [1] "Percent of sample -- CD4 T cells"
##  [2] "Percent of sample -- CD8 T cells"
##  [3] "Percent of sample -- Infil Macrophages"
##  [4] "Percent of sample -- Microglia"
##  [5] "Percent of sample -- Neutrophils"
##  [6] "Percent of sample -- NK cells"
##  [7] "Cells per sample -- CD4 T cells"
##  [8] "Cells per sample -- CD8 T cells"
##  [9] "Cells per sample -- Infil Macrophages"
## [10] "Cells per sample -- Microglia"
## [11] "Cells per sample -- Neutrophils"
## [12] "Cells per sample -- NK cells"
## [13] "MFI of CD11b_asinh -- CD4 T cells"
## [14] "MFI of CD11b_asinh -- CD8 T cells"
## [15] "MFI of CD11b_asinh -- Infil Macrophages"
## [16] "MFI of CD11b_asinh -- Microglia"
## [17] "MFI of CD11b_asinh -- Neutrophils"
## [18] "MFI of CD11b_asinh -- NK cells"
## [19] "MFI of Ly6C_asinh -- CD4 T cells"
## [20] "MFI of Ly6C_asinh -- CD8 T cells"
## [21] "MFI of Ly6C_asinh -- Infil Macrophages"
## [22] "MFI of Ly6C_asinh -- Microglia"
## [23] "MFI of Ly6C_asinh -- Neutrophils"
## [24] "MFI of Ly6C_asinh -- NK cells"

Reorder the data such that sample appear in the specify group order.

    ### Reorder summary data and SAVE

        sum.dat <- do.reorder(sum.dat, group.col, grp.order)
        sum.dat[,c(1:3)]
##         Sample Group Batch
##  1: 01_Mock_01  Mock     A
##  2: 02_Mock_02  Mock     B
##  3: 03_Mock_03  Mock     B
##  4: 04_Mock_04  Mock     A
##  5: 05_Mock_05  Mock     A
##  6: 06_Mock_06  Mock     B
##  7:  07_WNV_01   WNV     A
##  8:  08_WNV_02   WNV     B
##  9:  09_WNV_03   WNV     A
## 10:  10_WNV_04   WNV     A
## 11:  11_WNV_05   WNV     B
## 12:  12_WNV_06   WNV     A
        fwrite(sum.dat, 'sum.dat.csv')

Graphs

We can use the run.autograph function to create violin/scatter plots with embedded statistic – one per population/measurement type.

    ### Autographs

        for(i in plot.cols){

            measure <- gsub("\\ --.*", "", i)
            measure

            pop <- gsub("^[^--]*.-- ", "", i)
            pop

            make.autograph(sum.dat,
                           x.axis = group.col,
                           y.axis = i,
                           y.axis.label = measure,

                           grp.order = grp.order,
                           my_comparisons = comparisons,

                           Variance_test = variance.test,
                           Pairwise_test = pairwise.test,

                           title = pop,
                           subtitle = measure,
                           filename = paste0(i, '.pdf'))

        }


Heatmaps

We can also create a global heatmap show the z-score of each population/measurement type against each sample.

    ### Create a fold change heatmap

        ## Z-score calculation
        sum.dat.z <- do.zscore(sum.dat, plot.cols)

        ## Group
        t.first <- match(grp.order, sum.dat.z[[group.col]])
        t.first <- t.first -1
        t.first
## [1] 0 6
        ## Make heatmap
        make.pheatmap(sum.dat.z,
                      sample.col = sample.col,
                      plot.cols = paste0(plot.cols, '_zscore'),
                      is.fold = TRUE,
                      plot.title = 'Z-score',
                      annot.cols = annot.cols,
                      dendrograms = 'column',
                      row.sep = t.first,
                      cutree_cols = 3)



8. Save session info


#######################################################################################################
#### 8. Output session info
#######################################################################################################

For the final step of our setup, we want to record the session info our R session, and save this in a folder we’ll call “Output-info”.

   ### Session info and metadata

        setwd(OutputDirectory)
        dir.create("Output - info", showWarnings = FALSE)
        setwd("Output - info")

        sink(file = "session_info.txt", append=TRUE, split=FALSE, type = c("output", "message"))
        session_info()
        sink()

        write(cellular.cols, "cellular.cols.txt")
        write(cluster.cols, "cluster.cols.txt")